Towards Efficient MapReduce Using MPI

نویسندگان

  • Torsten Hoefler
  • Andrew Lumsdaine
  • Jack J. Dongarra
چکیده

MapReduce is an emerging programming paradigm for dataparallel applications. We discuss common strategies to implement a MapReduce runtime and propose an optimized implementation on top of MPI. Our implementation combines redistribution and reduce and moves them into the network. This approach especially benefits applications with a limited number of output keys in the map phase. We also show how anticipated MPI-2.2 and MPI-3 features, such as MPI Reduce local and nonblocking collective operations, can be used to implement and optimize MapReduce with a performance improvement of up to 25% on 127 cluster nodes. Finally, we discuss additional features that would enable MPI to more efficiently support all MapReduce applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A MapReduce and MPI Programming Model for Distributed Large Scale 3D Mesh Processing

Developing a high performance platform for large-scale, high-intensity data processing is a priority for researching cost-effective parallel finite element methods (FEM). This paper introduces an efficient MapReduce-MPI based strategy for parallel 3D finite element mesh processing, demonstrates the potential benefits of this approach for optimally utilizing system resources. Preliminary experim...

متن کامل

MPI for Big Data: New tricks for an old dog

The processing of massive amounts of data on clusters with finite amount of memory has become an important problem facing the parallel/distributed computing community. While MapReduce-style technologies provide an effective means for addressing various problems that fit within the MapReduce paradigm, there are many classes of problems for which this paradigm is ill-suited. In this paper we pres...

متن کامل

Genetic Algorithms with Mapreduce Runtimes

Data-intensive Computing has played a key role in processing vast volumes of data exploiting massive parallelism. Parallel computing frameworks have proven that terabytes of data can be routinely processed. Mapreduce is a parallel programming model and associated implementation founded by Google, which is one of the leading companies in IT. Genetic Algorithms have increasingly applied on parall...

متن کامل

Pattern matching of signature-based IDS using Myers algorithm under MapReduce framework

The rapid increase in wired Internet speed and the constant growth in the number of attacks make network protection a challenge. Intrusion detection systems (IDSs) play a crucial role in discovering suspicious activities and also in preventing their harmful impact. Existing signature-based IDSs have significant overheads in terms of execution time and memory usage mainly due to the pattern matc...

متن کامل

A Tree Algorithm Based on Parallel Cloud Computing Model

Cloud computing is the development of parallel computing, distributed computing and grid computing, and with the advancement of cloud computing, how to design efficient distributed tree algorithm is receiving more and more attention, Constrained by parallel assumption, Parallel tree algorithm are not easy to express in MapReduce. Inspired by Bulk Synchronous Parallel model, we propose an enhanc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009